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Exclusive two - level caching method for chip-multiprocessor, 
involves providing second tag state structure in second level cache so 
that simultaneous look -up at duplicate of the structures is possible 
Patent Assignee: BARROSO L A (BARR-I); GHARACHORLOO K { GHAR-I ) ; NOWATZYK A 
(NOWA-I) 

Inventor: BARROSO L A; GHARACHORLOO K; NOWATZYK A 
Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 20020010836 Al 20020124 US 2000210655 P 20000609 200223 B 

US 2000210675 P 20000610 
US 2001877530 A 20010608 

Priority Applications (No Type Date) : US 2001877530 A 20010608; US 

2000210655 P 20000609; US 2000210675 P 20000610 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 

US 20020010836 Al 15 G06F-012/00 Provisional application US 2000210655 

Provisional application US 2000210675 

A.-s; race (Basic): US 20020010836 Al 

NOVELTY - The method involves providing in a second level cache a 
second tag state structure so that a simultaneous look-up at the 
duplicate of the first tag state structure and the second tag state 
structure is possible. A single owner is associated with a cache line 
at any given time of a cache line lifetime in the 
chip-multiprocessor . 

DETAILED DESCRIPTION - The first tag state structure is provided in 
a first level cache of the two level cache system. A duplicate of 
the first tag state structure is maintained in a second level cache 
of the two level cache system. INDEPENDENT CLAIMS are also included 
for the following: 

(a) a method for maximizing the use of on-chip cache memory 
capacity in a chip multiprocessor; 

(b) and a two-level cache system in a chip multiprocessor. 
USE - Used for exclusive two-level catching in a 

chip-multiprocessor . 

ADVANTAGE - Minimize data replication and on-chip data traffic 
without incurring an increased second level hit latency or 
occupancy. Allows a substantially simultaneous look -up for data in 
the first level and second level tag state arrays. Eliminates 
redundant write-backs of evicted data to the second level cache . 

DESCRIPTION OF DRAWING (S) - The figure shows the chip 
multiprocessor design implementation. 
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Data buffering apparatus in computer system, directs simultaneously 
portion of data transmitted by processor and that received by 
input-output device, to different buffers located at different levels in 
memory 
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NOVELTY - Two selectors are provided to choose write and read 
buffers among multiple buffers to store data transmitted by processor 
(300) and to store data to be received by input-output device (700). A 
portion of data transmitted by processor to write buffer and that 
received by input-output device from read buffer, are simultaneously 
directed at different buffers located at different levels in 
memory . 

DETAILED DESCRIPTION - Multiple ring buffers (212,312,412) are 
allocated at different levels of the memory. The buffers are connected 
to the processor and input-output device, such that the processor and 
input-output device can simultaneously access the same buffer. The 
primary, secondary and tertiary levels of the memory are arranged on an 
IC with the processor, in off-chip cache and in DRAM, accordingly. 
The selector selects the read buffers in an identical order as write 
buffers. Control values indicating the order for selecting read 
buffers, are stored in software based registers stored in DRAM of 
input-output device. The control values include head pointer, tail 
pointer, length and base addresses associated with each of several 
buffers. An INDEPENDENT CLAIM is also included for data buffering 
method. 

USE - For optimizing memory utilization in communication between 
processor and input -ou tput device. 

ADVANTAGE - Makes use of hierarchical memory buffers to dynamically 
direct the data flow from the processor to input-output device through 
r he buffers. Optimizes use of memory of different levels in memory 
hierarchy, as the need for buffer space dynamically varies during 
computer system operation. The use of multiple buffers and dynamic 
switching between buffers, enhance the flow of data from processor to 
input-output device via the buffers. The switching technique permits 
the use of higher bandwidth cache based buffers with switches to main 
memory buffer, when the need for larger buffer arises. Permits the 
re.iatively continuous flow of software produced data to be maintained 
while supporting the very high speed hardware mediated DMA reads of a 
graphics device or similar devices. Hence computer system with greatly 
.unproved graphics output is offered. 

DESCRIPTION OF DRAWING (S) - The figure shows block diagram of 
primary components of data buffering apparatus. 

Ring buffers (212,312,412) 

Processor (300) 

Input-output devices (700) 
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Single chip with integrated processor and level two DRAM in 
multiprocessor system - maintains coherence between L2 cache having two 
port structure with data buffers for reload and store-back operation, by 
global directory with selectors and logic for directing cross- 
interrogates 
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NOVELTY - Coherence between L2 cache is maintained by global 
directory with selectors and logic for directing cross- interrogates to 
appropriate L2 cache . Each L2 cache is provided with pseudo two 
port structure with data buffers for reload and store-back operations. 
DETAILED DESCRIPTION - Each processor (511-51n) is provided with 
_:ri7ci-:e level one (LI) cache and associated translation-memory 
-..ir.H.jement logic no implement set associative and late-select cache 
wr^ro i -d::h LI cache is provided with multiported cache directory 
r coherency maintenance through fully shared Snoopy protocol. 

LI cache directories are interconnected through buses to allow 
simultaneous interrogation and update. The outputs of LI caches 
are interconnected by a selector-cross point switch for transferring 
data between caches . Each L2 cache of processor is comprised by 
DRAM main memory and SRAM buffer and is having translation- management 
logic to implement set associative, late-select organization with DRAM 
directories. Logic and data circuits are provided for interfacing with 
an external memory management unit thereby facilitates single chip to 
work alone as single node system or to couple single chip to other 
identical nodes via an external controller. 

USE - The single chip fabricated with integrated processor and 
level two DRAM is provided for multiprocessor system for portable 
computer . 

ADVANTAGE - By integrating the processor and L2 DRAM cache on 
single chip, high on-chip bandwidth, reduced latency and higher 
performance are achieved. Overcomes number of issues which limit the 
performance and cost of memory hierarchy by implementing several 
processors on single chip which are either independent or shared using 
single chip technology. Eliminates chip-to-chip crossing, since L2 DRAM 
cache is placed on same chip. By improving bus utilization efficiency 
with on-chip L2 cache DRAM array and eliminating trailing edge 
effects, system performance is improved. Reduces power consumption when 
compared to other approaches using simple technique. Facilitates to use 
:ombina t ion of coherency protocols. Since each processor has separate 
•■j};firency bus path into each directory and local directory controls 
:..vhss to all incoming requests, several requests are handled at 
.1 rectory level. DESCRIPTION OF DRAWING (S) - The figure depicts block 
diagram of multiprocessor system. (511-51n) Processor. 
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Controller for two-way set associative cache - provides local storage 

for frequently accessed memory locations 
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The cache controller (10) sits in parallel with a microprocessor 
bus (14, 15, 29) so as not to impede system response in the event of a 
cache miss. The cache controller tagram (24) is configured into two 
ways, each way including tag and valid-bit storage for associatively 
searching the directory for cache data-array addresses. The external 
cache memory (8) is organised such that both ways are simultaneously 
available to a number of available memory modules in the system to 
allow access time to occur in parallel with the tag lookup. 

USE /ADVANTAGE - Provides for 0, 64k, 128k cache with single 
socket and cascadable with multiple sockets. Both second- level 
cache and main memory can simultaneously look up access address. 
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